{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Scalable Kernel Interpolation for Product Kernels (SKIP)\n",
    "\n",
    "## Overview\n",
    "\n",
    "In this notebook, we'll overview of how to use SKIP, a method that exploits product structure in some kernels to reduce the dependency of SKI on the data dimensionality from exponential to linear. \n",
    "\n",
    "The most important practical consideration to note in this notebook is the use of `gpytorch.settings.max_root_decomposition_size`, which we explain the use of right before the training loop cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "import torch\n",
    "import gpytorch\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "# Make plots inline\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example notebook, we'll be using the `elevators` UCI dataset used in the paper. Running the next cell downloads a copy of the dataset that has already been scaled and normalized appropriately. For this notebook, we'll simply be splitting the data using the first 80% of the data as training and the last 20% as testing.\n",
    "\n",
    "**Note**: Running the next cell will attempt to download a ~400 KB dataset file to the current directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.request\n",
    "import os\n",
    "from scipy.io import loadmat\n",
    "from math import floor\n",
    "\n",
    "\n",
    "# this is for running the notebook in our testing framework\n",
    "smoke_test = ('CI' in os.environ)\n",
    "\n",
    "\n",
    "if not smoke_test and not os.path.isfile('../elevators.mat'):\n",
    "    print('Downloading \\'elevators\\' UCI dataset...')\n",
    "    urllib.request.urlretrieve('https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk', '../elevators.mat')\n",
    "\n",
    "\n",
    "if smoke_test:  # this is for running the notebook in our testing framework\n",
    "    X, y = torch.randn(1000, 3), torch.randn(1000)\n",
    "else:\n",
    "    data = torch.Tensor(loadmat('../elevators.mat')['data'])\n",
    "    X = data[:, :-1]\n",
    "    X = X - X.min(0)[0]\n",
    "    X = 2 * (X / X.max(0)[0]) - 1\n",
    "    y = data[:, -1]\n",
    "\n",
    "\n",
    "train_n = int(floor(0.8 * len(X)))\n",
    "train_x = X[:train_n, :].contiguous()\n",
    "train_y = y[:train_n].contiguous()\n",
    "\n",
    "test_x = X[train_n:, :].contiguous()\n",
    "test_y = y[train_n:].contiguous()\n",
    "\n",
    "if torch.cuda.is_available():\n",
    "    train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([16599, 18])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.size()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Defining the SKIP GP Model\n",
    "\n",
    "We now define the GP model. For more details on the use of GP models, see our simpler examples. This model uses a `GridInterpolationKernel` (SKI) with an RBF base kernel. To use SKIP, we make two changes:\n",
    "\n",
    "- First, we define our `base_covar_module` to have a batch_shape equal to the dimensionality of the data.\n",
    "  We make this change because we will the `base_covar_module` to construct a batch of univariate kernels\n",
    "  which we will then multiply using SKIP.\n",
    "- We use only a 1 dimensional `GridInterpolationKernel` (e.g., by passing `num_dims=1`). The idea of SKIP is to use a product of 1 dimensional `GridInterpolationKernel`s instead of a single `d` dimensional one.\n",
    "- In the `forward` call, we reshape `x` to be `d x n x 1` before passing it through the `covar_module`.\n",
    "  Our `covar_module` produces a batch of univariate kernels, and `x` must treat each dimension as a batch.\n",
    "- After constructing our univariate covariance matrices, we multiply them all together by calling `.prod(dim=-3)`.\n",
    "\n",
    "For more details on this construction, see the [Kernels with Additive or Product Structure tutorial](../00_Basic_Usage/kernels_with_additive_or_product_structure.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gpytorch.means import ConstantMean\n",
    "from gpytorch.kernels import ScaleKernel, RBFKernel, GridInterpolationKernel\n",
    "from gpytorch.distributions import MultivariateNormal\n",
    "\n",
    "class GPRegressionModel(gpytorch.models.ExactGP):\n",
    "    def __init__(self, train_x, train_y, likelihood):\n",
    "        super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)\n",
    "        self.mean_module = ConstantMean()\n",
    "        self.base_covar_module = RBFKernel(batch_shape=torch.Size([train_x.size(-1)]))\n",
    "        self.covar_module = ScaleKernel(\n",
    "            GridInterpolationKernel(self.base_covar_module, grid_size=100, num_dims=1)\n",
    "        )\n",
    "\n",
    "    def forward(self, x):\n",
    "        mean_x = self.mean_module(x)\n",
    "        univariate_covars = self.covar_module(x.mT.unsqueeze(-1))\n",
    "        covar_x = univariate_covars.prod(dim=-3)\n",
    "        return MultivariateNormal(mean_x, covar_x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "likelihood = gpytorch.likelihoods.GaussianLikelihood()\n",
    "model = GPRegressionModel(train_x, train_y, likelihood)\n",
    "\n",
    "if torch.cuda.is_available():\n",
    "    model = model.cuda()\n",
    "    likelihood = likelihood.cuda()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training the model\n",
    "\n",
    "The training loop for SKIP has one main new feature we haven't seen before: we specify the `max_root_decomposition_size`. This controls how many iterations of Lanczos we want to use for SKIP, and trades off with time and--more importantly--space. Realistically, the goal should be to set this as high as possible without running out of memory.\n",
    "\n",
    "In some sense, this parameter is the main trade-off of SKIP. Whereas many inducing point methods care more about the number of inducing points, because SKIP approximates one dimensional kernels, it is able to do so very well with relatively few inducing points. The main source of approximation really comes from these Lanczos decompositions we perform."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/gpleiss/workspace/linear_operator/linear_operator/utils/sparse.py:51: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()\n",
      "  if nonzero_indices.storage():\n",
      "/home/gpleiss/workspace/linear_operator/linear_operator/utils/sparse.py:66: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:78.)\n",
      "  res = cls(index_tensor, value_tensor, interp_size)\n",
      "/home/gpleiss/workspace/linear_operator/linear_operator/utils/sparse.py:66: UserWarning: torch.sparse.SparseTensor(indices, values, shape, *, device=) is deprecated.  Please use torch.sparse_coo_tensor(indices, values, shape, dtype=, device=). (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:621.)\n",
      "  res = cls(index_tensor, value_tensor, interp_size)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Iter 1/50 - Loss: 0.782\n",
      "Iter 2/50 - Loss: 0.767\n",
      "Iter 3/50 - Loss: 0.749\n",
      "Iter 4/50 - Loss: 0.733\n",
      "Iter 5/50 - Loss: 0.717\n",
      "Iter 6/50 - Loss: 0.699\n",
      "Iter 7/50 - Loss: 0.682\n",
      "Iter 8/50 - Loss: 0.665\n",
      "Iter 9/50 - Loss: 0.648\n",
      "Iter 10/50 - Loss: 0.631\n",
      "Iter 11/50 - Loss: 0.613\n",
      "Iter 12/50 - Loss: 0.596\n",
      "Iter 13/50 - Loss: 0.578\n",
      "Iter 14/50 - Loss: 0.561\n",
      "Iter 15/50 - Loss: 0.544\n",
      "Iter 16/50 - Loss: 0.526\n",
      "Iter 17/50 - Loss: 0.509\n",
      "Iter 18/50 - Loss: 0.491\n",
      "Iter 19/50 - Loss: 0.474\n",
      "Iter 20/50 - Loss: 0.457\n",
      "Iter 21/50 - Loss: 0.439\n",
      "Iter 22/50 - Loss: 0.422\n",
      "Iter 23/50 - Loss: 0.405\n",
      "Iter 24/50 - Loss: 0.388\n",
      "Iter 25/50 - Loss: 0.372\n",
      "Iter 26/50 - Loss: 0.355\n",
      "Iter 27/50 - Loss: 0.339\n",
      "Iter 28/50 - Loss: 0.322\n",
      "Iter 29/50 - Loss: 0.306\n",
      "Iter 30/50 - Loss: 0.291\n",
      "Iter 31/50 - Loss: 0.276\n",
      "Iter 32/50 - Loss: 0.261\n",
      "Iter 33/50 - Loss: 0.246\n",
      "Iter 34/50 - Loss: 0.232\n",
      "Iter 35/50 - Loss: 0.218\n",
      "Iter 36/50 - Loss: 0.204\n",
      "Iter 37/50 - Loss: 0.191\n",
      "Iter 38/50 - Loss: 0.179\n",
      "Iter 39/50 - Loss: 0.167\n",
      "Iter 40/50 - Loss: 0.155\n",
      "Iter 41/50 - Loss: 0.144\n",
      "Iter 42/50 - Loss: 0.134\n",
      "Iter 43/50 - Loss: 0.124\n",
      "Iter 44/50 - Loss: 0.114\n",
      "Iter 45/50 - Loss: 0.106\n",
      "Iter 46/50 - Loss: 0.097\n",
      "Iter 47/50 - Loss: 0.089\n",
      "Iter 48/50 - Loss: 0.082\n",
      "Iter 49/50 - Loss: 0.075\n",
      "Iter 50/50 - Loss: 0.068\n",
      "CPU times: user 53.2 s, sys: 3.96 s, total: 57.1 s\n",
      "Wall time: 1min 16s\n"
     ]
    }
   ],
   "source": [
    "training_iterations = 2 if smoke_test else 50\n",
    "\n",
    "# Find optimal model hyperparameters\n",
    "model.train()\n",
    "likelihood.train()\n",
    "\n",
    "# Use the adam optimizer\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=0.05)\n",
    "\n",
    "# \"Loss\" for GPs - the marginal log likelihood\n",
    "mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)\n",
    "\n",
    "def train():\n",
    "    for i in range(training_iterations):\n",
    "        # Zero backprop gradients\n",
    "        optimizer.zero_grad()\n",
    "        with gpytorch.settings.use_toeplitz(False), gpytorch.settings.max_root_decomposition_size(30):\n",
    "            # Get output from model\n",
    "            output = model(train_x)\n",
    "            # Calc loss and backprop derivatives\n",
    "            loss = -mll(output, train_y)\n",
    "            loss.backward()\n",
    "        print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))\n",
    "        optimizer.step()\n",
    "        torch.cuda.empty_cache()\n",
    "        \n",
    "%time train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Making Predictions\n",
    "\n",
    "The next cell makes predictions with SKIP. We use the same max_root_decomposition size, and we also demonstrate increasing the max preconditioner size. Increasing the preconditioner size on this dataset is **not** necessary, but can make a big difference in final test performance, and is often preferable to increasing the number of CG iterations if you can afford the space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.eval()\n",
    "likelihood.eval()\n",
    "with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():\n",
    "    with gpytorch.settings.max_root_decomposition_size(30), gpytorch.settings.fast_pred_var():\n",
    "        preds = model(test_x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Test MAE: 0.18244513869285583\n"
     ]
    }
   ],
   "source": [
    "print('Test MAE: {}'.format(torch.mean(torch.abs(preds.mean - test_y))))"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
